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Introduction: Astrobiology is a multidisciplinary 
area of scientific research focused on studying the ori- 
gins of life on Earth and the conditions under which 
life might have emerged elsewhere in the universe. 
NASA uses the results of Astrobiology research to 
help define targets for future missions that are search- 
ing for life elsewhere in the universe. 

The understanding of complex questions in Astro- 
biology requires integration and analysis of data span- 
ning a range of disciplines including biology, chemis- 
try, geology, astronomy and planetary science. How- 
ever, the lack of a centralized repository makes it diffi- 
cult for Astrobiology teams to share data and benefit 
from resultant synergies. Moreover, in recent years, 
federal agencies are requiring that results of any feder- 
ally funded scientific research must be available and 
useful for the public and the science community. 

The Astrobiology Habitable Environments Data- 
base (AHED), developed with a consolidated group of 
astrobiologists from different active research teams at 
NASA Ames Research Center, is designed to help to 
address these issues. AHED is a central, high-quality, 
long-term data repository for mineralogical, textural, 
morphological, inorganic and organic chemical, iso- 
topic and other information pertinent to the advance- 
ment of the field of Astrobiology. 

Objectives: AHED aims to promote the field of 
Astrobiology and increase scientific returns from 
NASA funded research by enabling data sharing, col- 
laboration and exposure of non-NASA scientists to 
NASA research initiatives and missions. 

The main goal of AHED is the creation of a single 
repository that has the flexibility to deal with the di- 
versity of Astrobiology datasets, while allowing a de- 
gree of standardization necessary for more rapid data- 
base creation, fulfillment of data archiving mandates, 
as well as facilitating data discovery and mining 
through efficient search. 

Characteristics: AHED is a collection of data- 
bases storing information about samples, measure- 
ments, analyses and contextual information about field 
sites where samples were collected, the instruments or 
equipment used for analysis, and people and institu- 
tions involved in their collection. 

In coming versions, AHED will be structured 
based on framework of metadata templates. A pub- 
lished AHED metadata standard will sit at the highest 
level of this scheme, defining metadata requirements 


of AHED subscribing databases (Fig. 1). Curation 
groups and users will create a library of database tem- 
plates to allow other scientists and researchers to make 
compatible, but flexible, database designs tailored to 
their datasets. Eventually, the template system will 
allow these curators to publish their specifications in 
commonly accepted metadata formats such as the Dub- 
lin Core Initiative's metadata standard 
(http://dublincore.org). 
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Figure 1. AHED framework of metadata templates and 
AHED search scheme. 


All AHED databases will conform to the AHED 
metadata standard, allowing data mining and search 
through the AHED web portal (Fig. 2). 

Infrastructure: AHED will provide public and 
open-access to Astrobiology-related research data 
through a user-managed web portal implemented 
using open-source software created by the Open 
Data Repository (ODR)", At the same time, the 
public definition on the AHED metadata standard 
will allow other platforms and software to curate 
datasets in a way that makes them discoverable and 


searchable by the AHED web portal (Fig. 1). Track- 
ing and publishing changes in the AHED metadata 
standard allows repository and database software to 
prompt database curators and owners to keep data- 
bases in compliance with the latest version of the 
AHED metadata standard. 
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Figure 2. Screen shot from AHED interface mockup. 


ODR’s Data Publisher. Astrobiology researchers 
often have small communities or operate individually 
with unique data sets that don’t easily fit into existing 
database structures. ODR constructed its Data Publish- 
er software to allow researchers to create databases 
with common metadata structures and subsequently 
extend them to meet their individual needs and data 
requirements. The software accomplishes these tasks 
through a web based interface that allows collaborative 
creation and revision of common metadata templates 
and individual extensions to these templates for cus- 
tom data sets. This allows researchers to search dispar- 
ate datasets based on common metadata established 
through the metadata tools, but still facilitates distinct 
analyses and data that may be stored alongside the 
required common metadata. The software produces 
web pages that can be made publicly available at the 
researcher’s discretion so that users may search and 
browse the data in an effort to make interoperability 
and data discovery a human-friendly task while also 
providing semantic data for machine-based discovery. 
Once relevant data has been identified, researchers can 
utilize the built-in application programming interface 
(API) that exposes the data for machine-based con- 
sumption and integration with existing data analysis 
tools (e.g. R, MATLAB, Project Jupyter’). 

ODR Functionality. 

Drag-and-drop procedure: From the master template, 
administrators of databases can add different field 
types and modify the layout at any time during the 
lifetime of the database. 

Graphing system: The ODR platform provides a di- 
versity of graph types based on _ PlotlyJS 
(https://plot.ly/javascript/) (Fig. 3). To maintain the 


ability to have many charts on a page and still mini- 
mize page load times, the graphing system creates pre- 
rendered, static versions of each chart and stores them 
for display. Once the page is loaded, a user can click 
on any pre-rendered graph and switch to an interactive 
display that allows zooming, focusing on a single point 
or line, and many other features. 

Large file upload: To enable browser-based large file 
uploads the system utilizes Flow.js 
(https://github.com/flowjs/flow.js) Also, users can 
upload multiple large files simultaneously allowing us 
to support researchers who work with large data sets 
(e.g. genetic sequencing data and high resolution 
images). 

CSV import: ODR provides a CSV import function 
that will automatically generate the template and popu- 
late the databases from a spreadsheet, allowing to im- 
port into the system large sets of data in a very short 
time. 
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Figure 3. Example of graphs plotted in ODR. 


Permission system: A powerful and versatile per- 
mission system protects confidentiality and helps pre- 
serve data integrity and provenance by ensuring only 
the users who are authorized can see data and make 
changes. 

Citation: A citation system will allow research 
data to be used and appropriately referenced by oth- 
er researchers after the data are made public. 
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